Model Selection

Efficient Parameter Utilization

# Efficient Parameter Utilization

TimeMoE-200M is a billion-scale time series foundation model based on the Mixture of Experts (MoE) architecture, focusing on time series forecasting tasks.

Codegen25 7b Multi P

CodeGen2.5 is a series of autoregressive language models for program synthesis, improved upon CodeGen2 and trained on StarCoderData, achieving high performance at a smaller scale.

Large Language Model

Xdoc Base Squad2.0

XDoc is a unified pre-training model capable of processing documents in different formats through a single model. With only 36.7% of the parameters, XDoc achieves comparable or superior performance in downstream tasks, offering significant cost-effectiveness in practical deployment.

Large Language Model

T5 Efficient Tiny Ff12000

T5-Efficient-TINY-FF12000 is a variant of Google's original T5, adopting a deep narrow architecture that demonstrates superior downstream task performance among models with similar parameter counts.

Large Language Model English

T5 Efficient Small Kv32

T5-Efficient-SMALL-KV32 is a variant of Google's original T5, adopting a deep narrow architecture focused on improving downstream task performance.

Large Language Model English

T5 Efficient Small Dm768

T5-Efficient-SMALL-DM768 is a variant of Google's original T5, adopting a deep narrow architecture that prioritizes increasing model depth to enhance downstream performance.

Large Language Model English

T5 Efficient Small

T5-Efficient-SMALL is a variant of Google's original T5, adopting a deep narrow architecture that outperforms other architectures in downstream tasks with similar parameter counts.

Large Language Model English

Chinese Legal Electra Small Generator

Chinese ELECTRA is a Chinese pre-trained model released by the HIT-iFLYTEK Joint Lab based on Google's ELECTRA model, featuring compact size and superior performance.

Large Language Model

Transformers Chinese

T5 Efficient Base

T5-Efficient-BASE is a variant of Google's T5 architecture, featuring a deep-narrow design optimized for downstream task performance, with 222.9 million parameters.

Large Language Model English

T5 Efficient Small Kv256

T5-Efficient-SMALL-KV256 is a variant of Google's T5, optimized for downstream task performance using a deep narrow architecture, with 117 million parameters, requiring fine-tuning for use.

Large Language Model English

T5 Efficient Small Nl22

T5 Efficient Small-NL22 is a deep narrow variant of Google's T5 model, focusing on improving downstream task performance by increasing model depth.

Large Language Model English

T5 Efficient Tiny

T5-Efficient-TINY is a deep-narrow variant of Google's T5 model, focusing on improving downstream task performance by increasing model depth rather than width.

Large Language Model English

T5 Efficient Mini

T5-Efficient-MINI is a variant of Google's original T5, adopting a deep narrow architecture that demonstrates superior downstream task performance among models with similar parameter counts.

Large Language Model English

T5 Efficient Tiny Nl2

T5-Efficient-TINY-NL2 is a variant of Google's original T5, adopting a deep narrow architecture focused on enhancing downstream task performance.

Large Language Model English

T5 Efficient Tiny Nl8

T5-Efficient-TINY-NL8 is an efficient variant of the Google T5 model, optimized for downstream task performance using a deep narrow architecture.

Large Language Model English

Deberta V3 Xsmall

DeBERTaV3 is an improved version of the DeBERTa model proposed by Microsoft, which enhances efficiency through ELECTRA-style gradient-disentangled embedding sharing pretraining method, demonstrating excellent performance in natural language understanding tasks.

Large Language Model

Transformers English

T5 Efficient Large Dm2000

T5 Efficient Large-DM2000 is a variant of Google's T5 model, adopting a deep narrow architecture that prioritizes increasing model depth to enhance downstream task performance.

Large Language Model English

T5 Efficient Base Nl48

T5-Efficient-BASE-NL48 is a variant of Google T5, adopting a deep narrow architecture that prioritizes increasing model depth to enhance downstream task performance.

Large Language Model English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase